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DEVICE AND METHOD FOR CREATING 
A SALIENCY MAP OF AN IMAGE. 

The invention is related to a device and a method for creating a 
S saliency map of an image. 

The human information processing system is intrinsically a limited 
system and especially for the visual system. In spite of the limits of our 
cognitive resources, this system has to face up to a huge amount of 

10 information contained in our visual environment Nevertheless and 
paradoxically, humans seem to succeed in solving this problem since we are 
able to understand our visual environment. 

It is commonly assumed that certain visual features are so elementary to 
the visual system that they require no attentional resources to be perceived. 

15 These visual features are called pre-attentive features. 

According to this tenet of vision research, human attentive behavior is 
shared between pre-attentive and attentive processing. As explained before, 
pre-attentive processing, so-called bottom-up processing, is linked to 
involuntary attention. Our attention is effortless drawn to salient parts of our 

20 view. When considering attentive processing, so-called top-down processing, 
it is proved that our attention is linked to a particular task that we have in 
mind. This second form of attention is thus a more deliberate and powerful 
one in the way that this form of attention requires effort to direct our gaze 
towards a particular direction. 

25 

The detection of the salient points in an image enable the improvement 
of further steps such as coding or image indexing, watermarking, video quality 
estimation. 

30 The known approaches are more or less based on non-psycho visual 

features. In opposition with such methods, the proposed method relies on the 
fact that the model is fully based on the human visual system (HVS) such as 
the computation of early visual features. 



• 
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In a first aspect, the invention proposes a method for creating a 
saliency map of an image comprising the steps of : 

- projection of said image according to the luminance component 
and if said image is a color image, according to the luminance 

5 component and according to the chrominance components, 

- perceptual sub-bands decomposition of said components 
according to the visibility threshold of a human eye, 

- extraction of the salient elements of the sub-bands related to the 
luminance component, 

10 - contour enhancement of said salient elements in each sub-band 

related to the luminance component, 

- calculation of a saliency map from the contour enhancement, for 
each sub-band related to the luminance component. 

- creation of the saliency map as a function of the saliency maps 
15 obtained for each sub-band. 



In a second aspect, the invention proposes a device for creating a 
saliency map of an image characterized in that it comprises means for: 

- Projecting said image according to the luminance component 
20 and if said image is a color image, according to the luminance 

component and according to the chrominance components, 

- Transposing into the frequential domains said luminance and 
chrominance signals, 

- Decomposing into perceptual sub-bands said components of the 
25 frequential domain according to the visibility threshold of a human 

eye, 

- Extracting the salient elements of the sub-bands related to the 
luminance component, 

- Contour enhancing said salient elements in each sub-band 
30 related to the luminance component, 

- Calculating a saliency map from the contour enhancement, for 
each sub-band related to the luminance component. 

- Creating the saliency map as a function of the saliency maps 
obtained for each sub-band. 
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Other characteristics and advantages of the invention will appear 
through the description of a non -limiting embodiment of the invention, which 
will be illustrated, with the help of the enclosed drawings, wherein: 

5 

Figure 1 represents a general flow-chart of a preferred 
embodiment of the method according to the invention applied 
to a black and white image, 

Figure 2 represents a general flow-chart of a preferred 
10 embodiment of the method according to the invention applied 

to a black and white image, 

Figure 3 represents the psycho visual spatial frequency 
partitioning for the achromatic component, 
Figure 4 represents the psycho visual spatial frequency 
15 partitioning for the chromatic components, 

Figures 5 represents the Dally Contrast Sensitivity Function, 
Figure 6a and 6b represent respectively the visual masking and 
a non linear model of masking, 

Figure 7 represents the flow-chart of the normalisation step 
20 according to the preferred embodiment, 

Figure 8 represents the inhibition/excitation step, 

Figure 9 represents the profile of the filters to model facilitative 

interactions for 0=0, 

Figure 10 represents an illustration of the operator D(z), 
25 - Figure 1 1 represents the chromatic reinforcement step, 

Figure 12 represents the non CRF exhibition caused by the 
adjacent areas of the CRF flanks, 

Figure 13 represents a profile example of the normalized 
weighting function for a particular orientation and radial 
30 frequency. 



Figure 1 represents the general flow-chart of a preferred embodiment 
of the method according to the invention applied to a black and white image. 
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The algorithm is divided in three main parts. 

The first one named visibility is based on the fact that the human visual 
system (HVS) has a limited sensitivity. For example, the HVS is not able to 
perceive with a good precision all signals in your real environment and is 
5 insensible to small stimuli. The goal of this first step has to take into account 
these intrinsic limitations by using perceptual decomposition, contrast 
sensitivity functions (CSF) and masking functions. 

The second part is dedicated to the perception concept. The perception 
is a process that produces from images of the external world a description that 
10 is useful to the viewer and not cluttered with irrelevant information. To select 
relevant information, a center surround mechanism is notably used in 
accordance with biological evidences. 

The last step concerns some aspects of the perceptual grouping 
domain. The perceptual grouping refers to the human visual ability to extract 
15 significant images relations from lower level primitive image features without 
any knowledge of the image content and group them to obtain meaningful 
higher-level structure. The proposed method just focuses on contour 
integration and edge linking. 

Steps E3, E4 are executed on the signal in the frequential domain. 
20 Steps E1 , E6 and E9 are done in the spatial domain. 

Steps E7 and E8 are done in the frequential or spatial domain. If they 
are done in the frequential domain, a Fourier transformation has to be carried 
on before step E7 and an inverse Fourier transformation has to be carried out 
before step E9. 

25 In step E1, the luminance component is extracted from the considered 

image. 

In step E2, the luminance component is transposed into the frequency 
domain by using known transformations such as the Fourier transformation in 
order to be able to apply in step E3, the perceptual sub-band decomposition 
30 on the image. 

In step E3, a perceptual decomposition is applied on the luminance 
component. This decomposition is inspired from the cortex transform and 
based on the decomposition proposed in the document M The computation of 
visual bandwidths and their impact in image decomposition and coding", 
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International Conference and Signal Processing Applications and Technology, 
Santa-Clara, California, pp. 776-770, 1993. This decomposition is done 
according to the visibility threshold of a human eye. 

The decomposition, based on different psychophysics experiments, is 
5 obtained by carving up the frequency domain both in spatial radial frequency 
and orientation. The perceptual decomposition of the component A leads to 
17 psycho visual sub-bands distributed on 4 crowns as shown on figure 3. 

The shaded region on the figure 3 indicates the spectral support of the 
sub-band belonging to the third crown and having an angular selectivity of 30 
10 degrees, from 15 to 45 degrees. 

Four domains (crowns) of spatial frequency are labeled from I to IV: 

I: spatial frequencies from 0 to 1 .5 cycles per degree; 

II: spatial frequencies from 1.5 to 5.7 cycles per degree; 

III: spatial frequencies from 5.7 to 14.2 cycles per degree; 
15 IV: spatial frequencies from 14.2 to 28.2 cycles per degree. 

The angular selectivity depends on the considered frequency domain. 
For low frequencies, there is no angular selectivity. 

20 The main properties of these decompositions and the main differences 

from the cortex transform are a non-dyadic radial selectivity and an orientation 
selectivity that increases with the radial frequency. 

Each resulting sub-band may be regarded as the neural image 
corresponding to a population of visual cells tuned to a range of spatial 

25 frequency and a particular orientation. In fact, those cells belong to the 
primary visual cortex (also called striate cortex or V1 for visual area 1). It 
consists of about 200 million neurons in total and receives its input from the 
lateral geniculate nucleus. About 80 percent of the cells are selective for 
orientation and spatial frequency of the visual stimulus. 

30 

On the image spatial spectrum, a well-known property of the HVS is 
applied, which is known as the contrast sensitivity function (CSF). The CSF 
applied is a multivariate function mainly depending on the spatial frequency, 
the orientation and the viewing distance. 
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Biological evidences have shown that visual cells response to stimuli 
above a certain contrast The contrast value for which a visual cell response is 
called the visibility threshold (above this threshold, the stimuli is visible). This 
threshold varies with numerous parameters such as the spatial frequency of 
5 the stimuli, the orientation of the stimuli, the viewing distance, ... This 
variability leads us to the concept of the CSF which expresses the sensitivity 
of the human eyes (the sensitivity is equal to the inverse of the contrast 
threshold) as a multivariate function. Consequently, the CSF permits to 
assess the sensitivity of the human eyes for a given stimuli. 
10 In step E4, a 2D anisotropic CSF designed by Dally is applied. Such a 

CSF is described in document "the visible different predictor: an algorithm for 
the assessment of image fidelity", in proceedings of SPIE Human vision,visual 
processing and digital display III, volume 1666, pages 2-15, 1992. 

15 The CSF enables the modelisation of an important property of the 

eyes, as the SVH cells are very sensitive to the spatial frequencies. 
On figure 5, the Dally CSF is illustrated. 

Once the Dally function has been applied, an inverse Fourrier 
20 Transformation is applied on the signal in step E5, in order to be able to apply 
the next step E6. 

For natural pictures, the sensitivity can be modulated (increased or 
decreased the visibility threshold) by the presence of another stimulus. This 
25 modulation of the sensitivity of the human eyes is called the visual masking, 
as done in step E6. 

An illustration of masking effect is shown on the figures 6a and 6b. Two 
cues are considered, a target and a masker where Ct and Cm are the contrast 
threshold of the target in the presence of the masker and the contrast of the 
30 masker respectively. Moreover, Cto is the contrast threshold measured by a 
CSF (without masking effect). 

When Cm varies, three regions can be defined : 
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• At low values of Cm , the detection threshold remains constant. The 
visibility of the target is not modified by the masker. 

• When C M tends toward Cto, the masker eases the detection of the 
target by decreasing the visibility threshold. This phenomenon is 

5 called facilitative or pedestal effect. 

• When Cm increases, the target is masked by the masker. His 
contrast threshold increases. 



The visual masking method is based on the detection of a simple signal 
10 as sinusoidal patterns. 

There are several other methods to achieve the visual masking 
modeling based on psychophysics experiments: for instance, a best method 
refers to the detection of quantization noise. 

It is obvious that the preferred method is a strong simplification as 
15 regard the intrinsic complexity of natural pictures. Nevertheless, numerous 
applications (watermarking, video quality assessment) are built around such 
principle with interesting results compared to the complexity. 

In the context of sub-band decomposition, masking has been 
intensively studied leading to define three kinds of masking: intra-channel 
i*? 20 masking, inter-channel masking and inter-component masking. 

The intra-channel masking occurs between signals having the same 
features (frequency and orientation) and consequently belonging to the same 
channel. It is the most important masking effect. 

The inter-channel masking occurs between signals belonging to 
25 different channels of the same component. 

The inter-component masking occurs between channels of different 
components (the component A and one chromatic component for example). 
These two last visual masking are put together and are just called inter- 
masking in the following. 
30 For the achromatic component, we used the masking function designed 

by Dally in document entitled "A visual model for Optimizing the Design of 
Image Processing Algorithms 1 ', in IEEE international conferences on image 
processing, pages 16-20, 1994, in spite of the fact that this model doesn't take 
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into account the pedestal effect. The strength of this model lies in the fact that 
it has been optimized with a huge amount of experimental results. 
The variation of the visibility threshold is given by : 



10 



Mb 



where R tJ is a psycho visual channel stemming from the perceptual channel 

decomposition (For example, the shaded region on the figure 2.1 leads to the 
channel R m2 ). The values ki, k 2 , s, b are given below : 



k1 =0.01 53 
k2= 392.5 



The below table gives the values of s and b according to the 
15 considered sub-band: 



Sub-band 


s 


b 


1 


0.75 


4 


II 


1* 


4 


III 


0.85 


4 


IV 


0.85 


4 



We get the signal R tJ (xj)a\ the output of the masking step. 

20 R)j (x, y) = R u (x, y) I T u (*, y) 

Then, in step E7, the step of normalization enables to extract the main 
important information from the sub-band. Step E7 is detailed on figure 7. 

In step S1 , a first sub-band R'yfr.y) is selected. The steps S2 to S4 and 
S8 are carried on for each sub-band R'ijCx.y) of the 17 sub-bands. 
25 The steps S5 to S7 are done for the second crown (II). 

I represents the spatial radial frequency band, I belongs to {I, II, III, IV}. 

J represents the orientation, j belongs to {1 , 2, 3, 4, 5, 6}, 

(x,y) represent the spatial coordinates. 
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In other embodiments, the different steps can be carried out on all the 
sub-bands. 

Steps S2 and S3 aim to modelize the behavior of the classical 
receptive field (CRF). 

The concept of CRF permits to establish a link between a retinal image 
and the global percept of the scene. The CRF is defined as a particular region 
of visual field within which an appropriate stimulation (with preferred 
orientation and frequency) provokes a relevant response stemming from 
visual cell. Consequently, by definition, a stimulus in the outer region (called 
surround) cannot activate the cell directly. 

The inhibition and excitation in steps S2 and S3 are obtained by a 
Gabor filter, which is sensible as for the orientation and the frequency. 

The Gabor filter can be represented as following: 



15 



gabor(x, y,o x9 o y9 f,Q ) = G a ^ 9 y Q )cos(2tcA ) 



20 



f being the spatial frequency of the cosinus modulation in cycles per 
degree (cy/°). 

(xe, ye) are obtained by a translation of the original coordinates 
(x 0 ,>> 0 ) and by a rotation of 9 , 



~ x f>' 


r- 







cos 9 sinO 
- sin 6 cos 9 



jc — x, 



25 



a x ,a y 



(x,y) = Aexp< 



f 



v 



V2g 



r 



J 



V2g 



\ 



/ J 



A representing the amplitude, 

o x el o v representing the width of the gaussian envelop along the x 



and y axis respectively. 
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excitation(x 9 y 9 <J x9 a y9 f 9 Q) = 



gabor(x 9 y,G x ,G y ,f,e) if -l/(4/)<x e <l/(4/) 
0 otherwise 



In order to obtain elliptic shapes, we take different variances o x < a y . 



Finally, we get the output: 



R/j (*> y) = R \j (*> y) * excitation (x,y 9 o x9 o y ,f 9 Q) 



In step S3, the inhibition is calculated by the following formula 



inhibition ( x 9 y 9 o x , o y , /, 9 ) = 



_|0 si -1/(4/) <* e ^V(4/) 
||^aior(x, y 9 o x , a y 9 f 9 9 )| sinon. 



And finally: 



1 5 (x 9 y) = ^ (x 9 y) * inhibition (x,y 9 o x9 g y9 f 9 Q) 



In step S4, the difference between the excitation and the inhibition is 
done. The positive components are kept, the negative components are set to 
"0". This is the following operation, 

20 

R' u (x,y) =\R^(x,y)-R™(x,y\ 



In step S5, for each orientation, for each sub-band of the second 
domain, two convolution products are calculated: 

25 

Ll j {x,y) = R" iJ (x, y )*Bl J {x,y) 
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Bfj(x,y) and B] tj {x,y) are 2 half-butterfly filters. The profile of these 

filters allow the modelling of facilitative interactions for 9=0 given on figure 9. 
These filters are defined by using a bipole/butterfly filter. 

It consists of a directional term D e (x,y) and a proximity term 

5 generated by a circle c r blurred by a gaussian filter G a a (x,y) . 



10 



*e,„a,.,a fry) = Q> (x,y) . C r * G (x,y) 



with Db u (x,y) = 



,n/2 . , 
cos(— — <p)si<p <a 
a 

0 sinon. 



u sinon. 

and <p = arctaK^o i where (*\/) r is the vector (x,yf rotated by q u . The 

parameter a defines the opening angle 2a of the bipole filter. It depends on 
the angular selectivity y of the considered sub-band. We take a = o.4xy . The 

size of the bipole filter is about twice the size of the CRF of a visual cell. 



15 



In step S6 f we compute the facilitative coefficient: 

^(x^ + Lljfry) 



f™(x,y) = D(- 



max(p , \L) tJ (x, y) - L° u (x, y)\) 



) 



with, 



3 a constant, 



20 



D(z) = \ 



0 9 z<s l9 
a ly z<s 2> 



where a. <l,ie [O.-JV-l] 



An illustration of the operator D(z) is given on figure 9. 



25 



To ease the application of the facilitative coefficient, the operator D(z) 

ensures that the facilitative coefficient is constant by piece as shown on figure 
9. 
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In step S7 9 the facilitative coefficient is applied to the normalized result 
obtained in step S4. 

5 Rlj (x, y) = Rl (x, y ) x (1 + f» (x, y)) 

Going back to step E8 of figure 1, after step S7 of figure 7, the four 
saliency maps obtained for the domain II are combined to get the whole 
saliency map according to the following formula: 

10 

fixation (x, y) = a x R~ 0 (x, j)+px R~ n , (x, y) + % x j£ 2 (x, y) + 5x if" 3 (x, >0 

a, p, x> 8 represent weighting coefficients which depend on the 
application (watermarking, coding...). 

15 

In other embodiments, the saliency map can be obtained by a 
calculation using the whole 17 sub-bands and not only the sub-bands of 
domain II. 

Figure 2 represents the general flow-chart of a preferred embodiment 
of the method according to the invention applied to a colour image. 

Steps T1 , T4, T4, "H4, T5 and T8 are done in the spatial domain. 

Steps T2, T2, "T2, T3, T3, T3 are done in the frequencial domain. 

» 

A Fourier transformation is applied on three components between step 
T1 and steps T2, T'2, T"2. 

An inverse Fourier transformation is applied between respectively T3, 
T3, T3 and T4, T4 and T w 4. 

Steps T6 and T7 can be done in the frequential or spatial domain. If 
30 they are done in the frequential domain, a Fourier transformation is done on 
the signal between steps T5 and T6 and an inverse Fourier transformation is 
done between steps T7 and T8. 



20 



25 
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Step T1 consists in converting the RGB luminances into the 
Krauskopfs opponent-colors space composed by the cardinal directions A, 
Cr1 and Cr2. 

This transformation to the opponent-colors space is a way to 
5 decorrelate color information. In fact, it's believed that the brain uses 3 
different pathways to encode information: the first conveys the luminance 
signal (A), the second the red and green components (Cr1) and the third the 
blue and yellow components (Cr2). 

10 These cardinal directions are in closely correspondence with 

signals stemming from the three types of cones (L,M,S). 

Each of the three components RGB firstly undergoes a power-law 
non linearity (called gamma law) of the form x y with y ~2A. This step is 

necessary in order to take into account the transfer function of the display 
15 system. The CIE (French acronym for "commission intemationale de 
rgclairage") XYZ tristimulus value which form the basis for the conversion to 
an HVS color space is then computed by the following equation: 



Y 



(0A\2 
0.213 
0.019 



v 



0.358 
0.715 
0.119 



0.18 
0.072 
0.95 



G 



20 



The response of the (L,M,S) cones are computed as follows : 





f 


M 










< 



0.240 
-0.389 
-0.001 



0.854 
1.160 
0.002 



- 0.0448 YX} 



0.085 
0.573 



Z 

A. J 



25 From the LMS space, one has to obtain an opponent color space. 

There is a variety of opponent color space, which differ in the way to combine 
the different cones responses. From experimental experiments, the color 
space designed by Krauskopf have been validated and it is given by the 
following transformation : 
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Cr2 




V J 


V 



1 
1 

0.5 



1 

-1 
-0.5 



oYz 



0 

1 



M 

A J 



Then, in step T2, a perceptual decomposition is applied to the 
luminance component. Preliminary to step T2 and further to step T1, the 
5 luminance component is transposed into the frequency domain by using 
known transformations such as the Fourier transformation in order to be able 
to apply in step T2, the perceptual sub-band decomposition on the image. 

The perceptual sub-band decomposition of step T2 is the same as the 
step E3 of figure 1 , and thus will not be described here, as described earlier. 
10 Concerning the decomposition of chromatic components Cr2 and CM , 

of steps T2 and 1^2 as shown on figure 4, the decomposition leads to 5 
psychovisual sub-bands for each of these components distributed on 2 
crowns. Preliminary to steps T2, V2 and further to step T1 , the chrominance 
components are transposed into the frequency domain by using known 
15 transformations such as the Fourier transformation in order to be able to apply 
in step T2 and T"2, the perceptual sub-band decomposition on the image. 

Two domains of spatial frequency are labeled from I to II: 

I: spatial frequencies from 0 to 1 .5 cycles per degree, 

Ihspatial frequencies from 1.5 to 5.7 cycles per degree. 
20 In steps T3, T'3 and T w 3, a contract sensitivity function (CSF) is applied. 

In step T3, the same contrast sensitivity as in step E4 of figure 1 is 
performed on the luminance component and thus will not be described here. 

In step T3 and T*3, the same CSF is applied on the two chromatic 
components CM and Cr2. On the two chromatic components, a two- 
25 dimensional anisotropic CSF designed by Le Callet is applied. It is described 
in document « criteres objectifs avec references de qualite visuelle des 
images couleurs » of Mr Le Callet, university of Nantes, 2001 . 

This CSF uses two low-pass filters with a cut-off frequency of about 5.5 
cyclse per degree and 4.1 cycles per degree respectively for CM and Cr2 
30 components. 

In order to permit the direct comparison between early visual features 
stemming from different visual modalities (achromatic and chromatic 
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components), the sub-bands related to the visibility are weighted. The visibility 
threshold being defined as the stimulus 's contrast at a particular point for 
which the stimulus just becomes visible. 

An inverse Fourier transformation is then applied on the different 
5 components (not shown on figure 2) in order to be able to apply the masking 
in the spatial domain. 

Then, an intra masking is applied on the different sub-bands for the 
chromatic components CM and Cr2 during step T4 and T"4 and for the 
achromatic component in step T4. This last step has already been explained 
10 in the description of figure 1, step E6. Thus, it will not be described again 
here. 

Intra channel masking is incorporated as a weighing of the outputs of 
the CSF function. Masking is a very important phenomenon in perception as it 

15 describes interactions between stimuli. In fact, the visibility threshold of a 
stimulus can be affected by the presence of another stimulus. 

Masking is strongest between stimuli located in the same perceptual 
channel or in the same sub-band. We apply the intra masking function 
designed by Dally on the achromatic component as described on figure 1, 

20 step E6 and, on the color component, the intra masking function described in 
document of P. Le Callet and D. Barba, "Frequency and spatial pooling of 
visual differences for still image quality assessment", in Proc. SPIE Human 
Vision and Electronic Imaging Conference, San Jose, CA, Vol. 3959, January 
2000. 

25 These masking functions consist of non linear transducer as expressed 

in document of Legge and Foley, "Contrast Masking in Human Vision", 
Journal of the Optical Society of America, Vol. 70, , N° 12, pp. 1458-1471, 
December 1980. 

Visual masking is strongest between stimuli located in the same 
30 perceptual channel (intra-channel masking). Nevertheless, as shown in 
numerous studies, there are several interactions called inter-component 
masking providing a masking or a pedestal effect. From psychophysics 
experiments, significant inter-components interactions involving the chromatic 
components have been elected. Consequently, the sensitivity of achromatic 
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component could be increased or decreased by the CM component. The 
influence of the Cr2 on the achromatic component is considered insignificant. 
Finally, the CM can also modulate the sensitivity of Cr2 component (and vice 
versa). 

5 

Then in step T5, a chromatic reinforcement is done. 

The colour is one of the strongest attractor of the attention and the 
invention wants to take advantage of this attraction strength by putting forward 
10 the following property: the existence of regions showing a sharp colour and 
fully surrounded of areas having quite other colours implies a particular 
attraction to the borders of this region. 

To avoid the difficult issue of aggregating measures stemming from 
achromatic and chromatic components, the colour facilitation consists in 
15 enhancing the saliency of achromatic structure by using a facilitative 
coefficient computed on the low frequencies of the chromatic components. 

In the preferred embodiment, only a sub-set of the set of 
achromatic channels is reinforced. This subset contains 4 channels having an 
20 angular selectivity equal to rc/4and a spatial radial frequency (expressed in 

cyc/deg) belonging to [1.5,5.7]. One notes these channels R id where i 

represents the spatial radial frequency and j pertains to the orientation. In 

our example, j is equal to { 0 , % /4 , n /2 , 3% I 4 } . In order to compute 

a facilitative coefficient, one determines for each pixel of the low frequency of 
25 CM and Cr2 the contrast value related to the content of adjacent areas and to 
the current orientation of the reinforced achromatic channel as illustrated on 
the figure 11. On figure 11, the contrast value is obtained by computing the 
absolute difference between the average value of the set A and the average 
value of the set B. The sets A and B belong to the low frequency of CM or Cr2 
30 and are oriented in the preferred orientation of the considered achromatic 
channel. 
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The chromatic reinforcement is achieved by the following equation, on 
an achromatic (luminance) channel Ru(x,y). 



*Q i*,y) = Bui*, y) x a + 1^ - 4>i + \ A ~ *L >L 



where, 

^(x.y) represents the reinforced achromatic sub-band, 
R i9 j (x, y) represents an achromatic sub-band. 

\A -B\ k represents the contrast value computed around the current 

10 point on the chromatic component k in the preferred orientation of the sub- 
band R t j{x 9 y) t as shown on figure 7. In the embodiment, the sets A and B 
belong to the sub-band of the first crown (low frequency sub-band) of the 
chromatic component k with an orientation equal to ^ . 



15 In other embodiments, all the sub-bands can be considered. 

In step T6, a center/surround suppressive interaction is carried on. 
This operation consists first in a step of inhibition/excitation. 

20 A two-dimensional difference-of-Gaussians (DoG) is used to model the 

non-CRF inhibition behavior of cells. The DoG „ « -0c,y) is given by the 



M 



following equation : 



DoG a€ , « ^ (x,y) = G M m (x,y) - G (x,y) 



with G„ rt (jc,v) = exp( — ^— — ^— ) a two-dimensional 

" 2iz(o x c y y 2a 2 2c 2 / 



25 gaussian. 

Parameters (af ,a~) and (a £*, a £*) correspond to spatial extends of 

the Gaussian envelope along the x and y axis of the central Gaussian (the 

CRF center) and of the inhibitory Gaussian (the surround) respectively. These 
parameters have been experimentally determined in accordance with the 
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10 



radial frequency of the second crown (the radial frequency / e [l .5,5.7] is 
expressed in cycles/degree). Finally, the non-classical surround inhibition can 
be modeled by the normalized weighting function w - ~ « w (jc,jO given by 

^ j "y » x y 



the following equation: 

» U jr »"* 



1 



H(DoG -„.,.(**,/)) 



with, 



0, z<0 
z, z>0' 

(x\y) is obtained by translating the original coordinate system by (x 0 ,y 0 ) 
and rotating it by b u expressed in radian, 



V 




y. 





r*-*oi 



1 denotes the L, norm, i.e the absolute value. 



15 



The figure 12 shows the structure of non-CRF inhibition. 
The figure 13 shows a profile example of the normalized weighting 
function w ^ M (x,y) 

G x Py &y 



20 



The response B${x 9 y) of cortical cells to a particular sub-band 
R^)(x 9 y) is computed by the convolution of the sub-band ^](x 9 y) with the 



weighting function w M M (x 9 y) : 

x " y »° jf r 1 y 



Rff(x,y) = H(R%(x 9 y) - R$(x 9 y) * w^^^^ (x,y))\ i=n 
with H{z) defined as has been described above. 



In step T7, a facilitative interaction is carried on. 

This facilitative interaction is usually termed contour enhancement or 
25 contour integration. 

Facilitative interactions appear outside the CRF along the preferred 
orientation axis. These kinds of interactions are maximal when center and 
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surround stimuli are iso-oriented and co-aligned. In other words, as shown by 
several physiologically observations, the activity of cell is enhanced when the 
stimuli within the CRF and a stimuli within the surround are linked to form a 
contour. 

5 Contour integration in early visual preprocessing is simulated using 

two half butterfly filter #°, and b}j. The profiles of these filters are shown on 

the 9 and they are defined by using a bi pole/butterfly filter. It consists of a 
directional term n B (x 9 y) and a proximity term generated by a circle c r blurred 

by a gaussian filter G a (x,y) . 



10 



\ ^ • 

... _ „ cos(— — <p)si<p<a 
With A (J (x,y) = \ a ^ J 

0 sinon. 

and (p = arctaK^) , where (x\yf \s the vector ( x ,yf rotated by e u . The 

parameter a defines the opening angle 2a of the bipole filter. It depends on 
15 the angular selectivity y of the considered sub-band. One takes a = o.4xy . The 

size of the bipole filter is about twice the size of the CRF of a visual cell. 

The two half butterfly filter #? y and bIj are after deduced from the 

butterfly filter by using appropriate windows. 

20 For each orientation, sub-band and location, one computes the 

facilitative coefficient : 



L. , (x, v) + JL f (x, v) 
f w (x v\ = D( J tJ ^ 

max(0 , L u (x, y) - I? u (x, y)\) 



with, 

25 pa constant, 



$j{x,y) = l$3(x 9 y)*Bl(x 9 y), 
L)j(x, y) = y) * B^y) , 
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D{2) = \ 



0,z<s l9 
a l9 z<s 2 , 



where 



P*N-\* Z — S N-\ 



10 



15 



An illustration of the operator D(z) is given on figure 9. 

The sub-band ^resulting from the facilitative interaction is finally 
obtained by weighting the sub-band R$ by a factor depending on the ratio of 

the local maximum of the facilitative coefficient f^(x 9 y) and the global 

maximum of the facilitative coefficient computed on all sub-bands belonging to 
the same range of spatial frequency : 

max(/£(*jO) 
J maxCmaxC/JCjc,^))) 

From a standard butterfly shape, this facilitative factor permits to 
improve the saliency of isolated straight lines, t]*° permits to control the 
strength of this facilitative interaction. 

In step E8, the saliency map is obtained by summing all the resulting 
sub-bands obtained in step E7. 



20 In other embodiments of the invention, one can use all the sub-bands 

and not only the sub-bands of the second crown. 



Although cortical cells tuned to horizontal and vertical orientations are 
almost as numerous as cells tuned to other orientations, we don't introduce 
25 any weighting. This feature of the HVS is implicitly mimic by the application of 
2D anisotropic CSF. 



